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Abstract 



The recently measured yeast transcriptional network is analyzed in terms of sim- 
plified Boolean network models, with the aim of determining feasible rule structures, 
given the requirement of stable solutions of the generated Boolean networks. We 
find that for ensembles of generated models, those with canalyzing Boolean rules 
are remarkably stable, whereas those with random Boolean rules are only marginally 
stable. Furthermore, substantial parts of the generated networks are frozen, in the 
sense that they reach the same state regardless of initial state. Thus, our ensemble 
approach suggests that the yeast network shows highly ordered dynamics. 
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1 Introduction 



The regulatory network for Saccharomyces cerevisiae was recently measured [1] for 
106 of the 141 known transcription factors by determining the bindings of tran- 
scription factor proteins to promoter regions on the DNA. Associating the promoter 
regions with genes yields a network of directed gene-gene interactions. As described 
in [1,2] the significance of measured bindings with regard to infering putative interac- 
tions are quantified in terms of P values. The authors of [1] did not infer interactions 
having P values above a threshold value Pth = 0.001 for most of their analysis. Small 
threshold values Pth correspond to a small number of inferred interactions with high 
quality, whereas larger values correspond to more inferred connections but of lower 
quality. It was found that for the Pth = 0.001 network, the fan-out from each tran- 
scription factor to its regulated targets is substantial, on the average 38 [1]. From 
the underlying data (website: |http : / / web . wi . mit . edu / young / regulatory _networkj ) one 
finds that fairly few signals feed into each of them; on the average 1.9. The experi- 
ments yield the regulatory network architecture but neither the interaction rules at 
the nodes, nor the dynamics of the system, nor its final states. 

With no direct experimental results on the states of the system, there is of course 
no systematic method to pin down the interaction rules, not even within the frame- 
work of simplified and coarse-grained genetic network models, e.g. ones where the 
rules are Boolean. One can nevertheless attempt to investigate to what extent the 
measured architecture can select between classes of Boolean models [3] , based upon 
criteria of stability. 

We generate ensembles of different model networks on the given architecture, and 
analyze their behavior with respect to stability. In a stable system small initial per- 
turbations should not grow in time. This is investigated by monitoring how the Ham- 
ming distances between different initial states evolve in a "Derrida plot" [4]. If small 
Hamming distances diverge in time, the system is unstable and vice versa. Based 
upon this criterion we find that synchronously updated random Boolean networks 
(with a flat rule distribution) are marginally stable on the transcriptional network of 
yeast. 

Using a subset of Boolean rules, nested canalyzing functions (see sect. 12. 2j) . the 
ensemble of networks exhibits remarkable stability. The notion of nested canalyzing 
functions is introduced to provide a natural way of generating canalyzing rules, which 
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are abundant in biology [5]. Furthermore, it turns out that for these networks, 
there exists a fair amount of forcing structures [3], where non-neghgible parts of the 
networks are frozen to fixed final states regardless of the initial conditions. Also, 
we investigate the consequences of rewiring the network while retaining the local 
properties; the number of inputs and outputs for each node [6]. 

To accomplish the above, some novel tools and techniques were developed and 
used. In order to include more interactions than those in the Pth = 0.001 network [1], 
we investigate how network properties, local and global, change as Pth is increased. 
We find a transition slightly above Pth = 0.005, indicating the onset of noise in 
the form of biologically irrelevant inferred connections. In [5] extensive literature 
studies revealed that, for eukaryotes, the rules seem to be canalyzing. We develop a 
convenient method to generate a distribution of canalyzing rules, that fits well with 
the list of rules in [5]. 

2 Methods and Models 

2.1 Choosing Network Architecture 

In [1], P values were calculated as measures of confidence in the presence of an 
interaction. With further elucidation of noise levels, one might increase the threshold 
for P values from the value 0.001 used in [1]. To this end we compute various network 
properties, to investigate if there is any value of Pth for which these properties exhibit 
a transition that can be interpreted as the onset of noise. In Fig. 1 the number of 
nodes, mean connectivity, mean pairwise distance (radius) and fraction of node pairs 
connected are shown. As can be seen, there appears to be a transition slightly 
above Pth = 0.005. In what follows we therefore focus on the network defined by 
Pth = 0.005. Furthermore, we (recursively) remove genes which have no outputs to 
other genes, since these are not relevant for the network dynamics. The resulting 
network is shown in Fig. 2. 

2.2 Generating Rules 

In [1], the architecture of the network is determined, but not the specific rules for 
the interactions. In order to investigate the dynamics on the measured architecture. 
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we repeatedly assign a random Boolean rule to each node in the network. We use 
two rule distributions; one null-hypothesis and one distribution that agrees with rules 
compiled from the literature [5] (see Supporting Information) . In both cases we ensure 
that every rule depends on all of its inputs, since the dependence should be consistent 
with the network architecture. 

As a null-hypothesis, we use a flat distribution among all Boolean functions that 
depend on all inputs. For rules with a few inputs, this will create rules that can be 
expressed with normal Boolean functions in a convenient way. In the case of many 
inputs, most rules are unstructured and the result of toggling one input value will 
appear random. 

In biological systems, the distribution of rules is likely to be structured. Indeed all 
of the compiled rules in [5] are canalyzing [3]; a canalyzing Boolean function [3] has 
at least one input, such that for at least one input value, the output value is fixed. 
It is not straightforward to generate biologically relevant canalyzing functions. A 
canalyzing rule implies some structure, but the function of the non- canalyzing inputs 
(when the canalyzing inputs are clamped to their non-canalyzing values) could be as 
disordered as the full set of random Boolean rules. However, the canalyzing structure 
is repeated in a nested fashion for almost all rules in [5]. Hence, we introduce the 
concept of nested canalyzing functions (see Appendix), which can be used to generate 
distributions of canalyzing rules. Actually, of the 139 rules in [5] only 6 are not nested 
canalyzing functions (see Supporting Information) . 

A special case of nested canalyzing functions is the recently introduced notion of 
chain functions [7] (see Appendix) . Chain functions are the most abundant form of 
nested canalyzing functions, but 32 of the 139 rules in [5] fall outside this class. 

It turns out that the rule distribution of nested canalyzing functions in [5] can be 
well described by a model with only one parameter (see Appendix). Hence, we use 
this model to mimic the compiled rule distribution. The free parameter determines 
the degree of asymmetry between active and inactive states, and its value reflects the 
fact that most genes are inactive at any given time in a gene regulatory system. 

2.3 Analyzing the Dynamics 

A biological system is subject to a substantial amount of noise, making robustness 
a necessary feature of any model. We expect a transcriptional network to be stable, 



5 



in that a random disturbance can not be allowed to grow uncontrollably. Gene 
expression levels can be approximated as Boolean, as genes tend to be either active 
or inactive. This approximation for genetic networks is presumably easier to handle 
for stability issues than for general dynamical properties. Using synchronous updates 
is computationally and conceptually convenient though it may at first sight appear 
unrealistic. However, in instances of strong stability, the update order should not be 
very important. 

To study the time development of small fluctuations in this discrete model with 
synchronous updating, we investigate how the Hamming distance between two states 
evolves with time. In a Derrida plot [4] pairs of initial states are sampled at defined 
initial distances H{0) from the entire state space, and their mean Hamming distance 
H{t) after a fixed time t is plotted against the initial distance H{0). The slope in the 
low-H region indicates the fate of a small disturbance. If the curve is above/below the 
line H(t) = H{0) it reflects instability/stability in the sense that a small disturbance 
tend to increase/decrease during the next t time steps (see Fig. 3). 

It is not uncommon that transcription factors control their own expression. In 
some cases genes up-regulate themselves, with the effect that their behavior becomes 
less linear and more switch-like. This is readily mimicked in a Boolean network. 
However, in the other case, where a transcription factor down-regulates itself, the 
system will be stabilized in a model with continuous variables, provided that the 
time delay of the self-interaction is not too large. Boolean networks can only model 
the limit of large time delays, which gives rise to nodes that in an unbiological manner 
repeatedly flip between no activity and full activity without requiring any external 
input. Thus, the self-interactions need to be treated as a special case in the Boolean 
approximation. To this end, we consider three different alternatives: 

1. View the self-interactions as internal parts of the rules; all self-interactions are 
removed. 

2. Remove the possibility for self- interactions to be down-regulating. 

3. No special treatment of self-interactions. 

It is natural to use alternative Q as a reference point in order to understand the effect 
of the self-interactions in alternatives El and El 

We want to examine how the geometry of networks influence the dynamics. It is 
known [3] that the distributions of in- and out-connectivities of the nodes strongly 
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affect tlie dynamics in Boolean networks, but liow important is tlie overall archi- 
tecture? If for each node we preserve the connectivities, but otherwise rewire the 
network randomly [6], how is the dynamics affected? For a Derrida plot with t = 1, 
there is no change. If we only take a single time step from a random state, the outputs 
will not have time to be used as inputs. There will be correlations between nodes, 
but the measured quantity H{1) is a mean over all nodes, and this is not affected 
by these correlations. Hence, H{1) is not changed by the rewiring. In order to get a 
better picture of the dynamics we need to increase t. However, if we go high enough 
in t to probe larger structures in the networks, we lose sight of the transient effects 
of a perturbation. 

To remedy this, we opt to select a fixed initial Hamming distance H{0), and 
examine the expectation value of the distance as a function of time, using the nested 
canalyzing rules. As noise entering the biological network would act on the current 
state of the system rather than on an entirely random one, we select one of the states 
to be a fixed point of the dynamics, and let the probability of any given fixed point be 
proportional to the size of its attractor basin. A graph of H{t) shows the relaxation 
behavior of the perturbed system where the self-interactions have been removed (see 
Fig. 4a). We investigate the role of the self- interactions both in terms of relaxation 
of a perturbed fixed point (see Fig. 4b) and in terms of probabilities for random 
trajectories to end up in distinct fixed points and cycles. 

The assumption that the typical state of these networks is a fixed point can be 
motivated. A forcing connection [3] is a pair of connected nodes, such that control 
over a single input to one node is sufficient to force the output of the other node to 
one of the Boolean values. With canalyzing rules, this is fulfilled when the canalyzed 
output of the first node is a canalyzing input to the second. The existence of forcing 
structures implies stability, as a (forcing) signal traveling through such a structure 
will block out other inputs and is thereby likely to cause information loss. Abundant 
forcing structures should tend to favor fixed points. 

3 Results and Discussion 

Despite absence of knowledge about initial and final states, we have been able to get 
a hint about possible interaction rules within a Boolean network framework for the 
yeast transcriptional network. Our findings are: 
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Canalyzing Boolean rules confer far more stability than rules drawn from a 
flat distribution as is clear from the Derrida plots in Fig. 3. Yet, even a fiat 
distribution of Boolean functions yields marginal stability. 

The dynamical behavior around fixed points is more stable for the measured 
networlc than for the rewired ones, though only in the early time evolution (2-3 
time steps) of the systems (see Fig. 4a). The behavior at this time scale can 
be expected to depend largely on small network motifs, whose numbers are 
systematically changed by the rewiring [6]. 

The removal of self-couplings increases the stability in these networks. However, 
the relaxation is only changed significantly if we allow the toggling of self- 
interacting nodes (see Fig. 4b). This means that a node with a switch-like 
self-interaction is not likely to be toggled by its inputs during the relaxation. 
Nor do the down-regulating self-interactions alter the relaxation. This means 
that the overall properties of relaxation to fixed points can be investigated 
regardless of how the self-interactions should be modeled. 

The number of attractors and their length distribution are strongly dependent 
on how the self-interactions are modeled. The average numbers of distinct fixed 
points per rule assignments found in 1000 trials of different trajectories are 1.02, 
4.33 and 3.79, respectively, for the three self-interaction models. The numbers 
of 2-cycles are 0.02, 0.09 and 0.38, respectively. Longer cycles are less common; 
in total they sum up to 0.03, 0.11 and 0.11, respectively. 

Forcing structures [3] are prevalent for this architecture with canalyzing rules, 
as is evident from Fig. 2. On average 56 % of the couplings belong to forcing 
structures. As a consequence, most nodes will be forced to a fixed state regard- 
less of the initial state of the network. Even the highly connected nodes (in 
the center of the network) will be forced to a fixed state for a vast majority of 
the random rule assignments. In most cases, the whole network will be forced 
to a specific fixed state. At first glance this might seem un-biological. How- 
ever, in the real world there are more inputs to the system than the measured 
transcription factors, and to study a process such as the cell cycle, one may 
need to consider additional components of the system. With more inputs such 
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a strong stability — of the measured part of the network — may be necessary 
for robustness of the entire system. 

Future reverse engineering projects in transcriptional networks may be based on 
the restricted pool of nested canalyzing rules, which have been shown to generate 
very robust networks in this case. It should be pointed out that the notion of nested 
canalyzing functions is not intrinsically Boolean. For instance, the same concept can 
be applied to nested sigmoids. 
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Appendix: Nested Canalyzing Functions 

The notion of nested canalyzing functions is a natural extension of canalyzing func- 
tions. Consider a /C-input Boolean rule R with inputs ii, . . . ,1^ and output o. R 
is canalyzing on the input im if there are Boolean values and Om such that 
im = Im ^ o = Om- I-m is the caualyziug value, and Om is the canalyzed value for 
the output. 

For each canalyzing rule i?, renumber the inputs in a way such that R is canalyzing 
on ii. Then, there are Boolean values Ii and Oi such that zi = Ji o = Oi. To 
investigate the case ii = not Ii, fix ii to this value. This defines a new rule i?i with 
K — 1 inputs; 12-, ■ ■ ■ i^k- In most cases, when picking R from compiled data, Ri is 
also canalyzing. Then, renumber the inputs in order for Ri to be canalyzing on 12. 
Fixing 12 = NOT I2 renders a rule R2 with the inputs is, ... , ir- As long as the rules 
R, Ri, R2, ■ ■ ■ are canalyzing, we can repeat this procedure until we find Rk~i which 
has only one input ik and hence is trivially canalyzing. Such a rule i? is a nested 
canalyzing function and can be described by the canalyzing input values Ii, . . . ,Ik 
together with their respective canalyzed output values Oi, . . . , Ok and an additional 
value Odcfauit- The output is given by 

d if zi = h 

02 if ii 7^ h AND Z2 = h 

03 if ii ^ h AND Z2 7^ h AND is = I3 

Ok if ii 7^ h and ■ • ■ and iK-i ^ Ik~i and = Ik 

Odefault if ii 7^ h AND ■ ■ • AND iK ^ Ik ■ 

The notion of chain functions in [7] is equivalent to nested canalyzing functions that 
can be written on the form Ji = • ■ • = Ik-i = FALSE. 

We want to generate a distribution of rules with K inputs, such that all rules 
depend on every input. The dependency requirement is fulfilled if and only if 
Odcfauit = NOT Ok- Then, it remains to choose values for Ii, Ik and Oi, . . . , Ok- 
These values are independently and randomly chosen with the probabilities 

^ <r^ ^ exp(-2--a) 

p{Im = true) = p{Om = true) 



1 +exp(-2-'"a) 
ioY m = 1, . . . , K. For all generated distributions, we let a = 7. 
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The described scheme is sufficient to generate a well-defined rule distribution, 
but each rule has more than one representation in Ii, . . . ,Ik and Oi, . . . , Ok- In 
Supporting Information we describe how to obtain a unique representation, which is 
applied to the rules compiled in [5]. This enables us to present a firm comparison 
between the generated distribution and the list of rules in [5]. 
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Figure captions 



Fig. 1. Topological properties of the yeast regulatory network in [1] for different 
P value thresholds: Number of nodes (solid line), mean connectivity (dotted line), 
mean pairwise distance [radius] (dotted-solid line) and fraction of node pairs that 
are connected (dashed line). The right y-axis corresponds to the number of nodes 
with no outputs, whereas the other quantities are indicated on the left y-axis. Self- 
couplings were excluded, but the figure looks similar when they are included. The 
dashed vertical line marks the threshold Pth = 0.005. 

Fig. 2. The Pth = 0.005 network excluding nodes with no outputs to other nodes 
than itself. The filled areas in the arrow-heads are proportional to the probability of 
each coupling to be in a forcing structure when the nested canalyzing rules are used 
on the network without self-interactions. This probability ranges from approximately 
1/4 for the inputs to YAP6 to 1 for the inputs to one-input nodes. Nodes that will 
reach a frozen state (on or off) in the absence of down-regulating self-interactions, 
regardless of the choice of rules, are shown in dashed. For the other nodes, the 
grey scale indicates the probability of being frozen in the absence of self-interactions, 
ranging from just under 97% (bold black) to over 99.9 % (gray). 

Fig. 3. Evolution of different Hamming distances i?(0) with one time step to 
H{1) (Derrida plots [4]) for random rules (dark grey) and nested canalyzing rules 
(light grey) with and without self-couplings (dashed borders) respectively. (Down- 
regulating self-couplings are allowed.) The bands correspond to la variation among 
the different rule assignments generated on the architecture in Fig. 2. Statistics were 
gathered from 1000 starts on each of 1000 rule assignments. 

Fig. 4. The average time evolution of perturbed fixed points for nested canalyzing 
rules, starting from Hamming distance -ff(O) = 5; (a) impact of the network archi- 
tecture and (b) impact of the self-interactions. The lines marked with circles in both 
figures correspond to the network in Fig. 2 without self-interactions. The grey lines in 
(a) show the relaxation for 26 different rewired architectures with no self-interactions, 
with Icr errors of the calculated means indicated by the line widths. The black lines 
in (b) correspond to the network in Fig. 2 with self-interactions. The upper line 
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shows the case when it is allowed to toggle nodes with self-interactions as a state at 
H{0) = 5 is picked, while the lower line shows the relaxation if this is not allowed. 
The widths of these lines show the difference between allowing self-interactions to be 
repressive or not. 
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Supporting Information 



Random Boolean Network Models and the Yeast Transcriptional Network 
S. Kauffman, C. Peterson, B. Samuelsson and C. Troein 

Confronting Nested Canalyzing Functions with Compiled Data 

In order to compare compiled and generated distributions of rules, we must ensure 
that every nested canalyzing function is always represented by the same set of param- 
eters Ii, . . . ,Ik and Oi, . . . , Ok (see Appendix in the printed article). All ambiguities 
in the choice of the representation can be derived from the following operations: 

1. The transformation NOT together with Ok — ^ NOT Ok and Odcfauit 

NOT Odcfauit- 

2. Permutations among a set of inputs im, • • • , im+p such that Om = ■ ■ ■ = O^+p. 
The values of Im, • • • , Im+p are permutated in the same way as im, ■ ■ ■ , im+p- 

A unique representation is created from any choice of parameters in two steps. First, 
1. is applied if Ok Ok-i, which ensures that Ok = Ok-i- In order to handle the 
special case = 1 in a convenient way we define Oq = FALSE. Second, all intervals of 
inputs im, ■ ■ ■ , im+p such that 2. can be applied are identified and permutated so that 
Im = ■ ■ ■ = Im+q = FALSE and Im+q+1 = ■ ■ ■ = Im+p = TRUE for somc q,0 <q <p. 

Using the above described procedure, we can compare a generated rule distribution 
with the compiled distribution. First, we take away all redundant inputs of each 
observed rule. An input is redundant if the output is never dependent on that input. 
Starting from 66, 45 and 22 nested canalyzing rules with 3, 4 and 5 inputs respectively, 
the reduction renders 2, 9, 71, 35 and 16 such rules with 1, 2, 3, 4 and 5 inputs 
respectively. Second, we let a = 7 and generate rule distributions for each number 
of inputs, (a = 7 is not based on a precise fit, it was picked by hand to fit the 
distribution oi Ii, . . . , Ik-) Tabled shows the result for the most frequently observed 
rules, and Fig. ^ is a plot of the full rule distribution. The calculated distribution fits 
surprisingly well to the compiled one, considering that the model has only one free 
parameter, a. 
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Table 1: The list of nested canalyzing rules observed more than once in [5]. nobs is 
the number of observations in the compiled list of rules, whereas rzcaic is the average 
number of rules in the generated distribution. Each rule is described both as an 
ordinary Boolean expression, and with the parameters Ii, . . . ,Ik and Oi, . . . , Ok, 
where Odefauit = not Ok- and 1 correspond to false and true, respectively. 
The labels serve as references in Fig. 1, and capital labels mark rules that are chain 
functions, (not has higher operator precedence than and, whereas the precedences 
of OR and XOR are lower.) 
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(0- 


0), (0^ 


o),(i- 




(1- 


-1) 


il AND i2 AND i^ AND (^4 OR is) 


1 


(0- 


0), 


(0- 


0), (0^ 


i),(o- 


-^1), 


(0- 


-1) 


il AND i2 AND NOT (^3 AND i^ AND ^5) 


1 


(0- 


0), 


(0- 


0), (1^ 


0), (0- 


-^1), 


(0- 


-1) 


il AND i2 AND NOT (^3 OR AND is) 


1 


(0- 


0), 


(0- 


0), (1^ 


o),(i- 


-^1), 


(0- 


-1) 


il AND i2 AND NOT ^3 AND (^4 OR NOT is) 


1 


(0- 


0), 


(0- 


1), (0^ 


i),(o- 


-^1), 


(1- 


-1) 


il AND NOT (^2 AND ^3 AND ^4 
AND NOT is) 


1 


(0- 


0), 


(1- 


0), (1^ 


i),(o- 


-o),(i- 


-0) 


il AND NOT i2 AND (i3 OR i4 AND NOT is) 


1 


(0- 


0), 


(0- 


0), (non 


-canalyzing 






il AND i2 AND (i3 XOR i4) 


1 


(0- 


0), 


(non 


-canalyzing) 








il AND (i2 XOR i3 AND i4) 


1 


(0- 


0), 


(noD 


-canalyzing) 








il AND (2 <)(i2,i3,NOT ^4) 


1 


(1- 


0), 


(noD 


-canalyzing) 








NOT il AND (i2 AND NOT i^ 



OR i3 AND NOT (i4 OR is)) 

Table 2: Continuation of Table 1, containing the remainder of rules listed in [5]. The 
Boolean function (2 <) is true if at least two of its arguments are true. 
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Figure 1: Compiled and generated rule distributions of nested canalyzing functions. 
The gray half-circles have an area proportional to the number of times each rule has 
been observed, while their black counterparts reflect the calculated distribution. The 
labeled rules are listed in Table 1. Capital labels mark rules that are chain functions. 
Each rule is assigned a coordinate in the unit square above (having (0, 0) as its lower 
left corner), according to x = 1/2 + E^=i 2-™0(/„^), y = 1/2 + E™=i 2-'"0(O„), 
where 0(true) = 1/2 and </)(false) = —1/2. The crosses mark the possible coor- 
dinates for a rule that is represented in its unique form. The lines indicate how the 
coordinates can change when new inputs are added to an existing rule. 
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